3 Ways to Build a Recommendation Engine for Songs with LangChain

TL;DR We used LangChain, OpenAI ChatGPT, Deep Lake, and Streamlit to create a web app that recommends Disney songs based on user input. There are three main approaches you could go about the problem, but not all of them work (we learned it the hard way). What to do if you’re looking to build a similar app? Learn more below.

song recommendation engine langchain

A demo is on Hugging Face 🤗

Hey there! Today we will see how to leverage Deep Lake Vector Database to create a document retrieval system. This will be different from your usual Question Answering demo app, where we just directly apply the user’s query to embedded documents using LangChain. We will showcase how we can leverage Large Language Models (LLMs) to encode our data to make our matching easier, better, and faster.

Step by step, we’ll unpack the behind-the-scenes of FairytaleDJ, a web app to recommend Disney songs based on user input. The goal is simple: We ask how the user feels, and we want to retrieve Disney songs that go “well” with that input. For example, if the user is sad, a song like Reflection from Mulan would probably be appropriate. Spotify, we’re coming for you.

Just joking…

Or maybe not…

In any case, such ‘document’ retrieval is a perfect example where vanilla Question Answering over docs fails. You won’t get good results if you try to find similarities between users’ feelings (like, “Today I am great”) and song lyrics. That’s because song embeddings capture everything in the lyrics, making them "more open". Instead, we want to encode inputs, users, and lyrics into a similar representation and then run the search. We won’t spoil too much here, so shopping list time. We need mainly three things: data, a way to encode it, and a way to match it with user input.

Getting the Data for the Song Recommendation Engine

To get our songs, we scraped https://www.disneyclips.com/lyrics/, a website containing all the lyrics for all Disney songs ever made. The code is here, and it relies on asyncio to speed up things. We won’t focus too much on it, since it’s not central to our story (plays Encanto music we don’t talk about asyncio, no, no, no…).

Then, we used Spotify Python APIs to get all the embedding URLs for each song into the “Disney Hits” Playlist. We removed all the songs we had scraped but were not in this playlist. By doing so, we end up with 85 songs.

We end up with a json looking like this.

 
      
        1json
2{
3  "Aladdin": [
4    {
5      "name": "Arabian Nights",
6      "text": "Oh, I come from a land, from a faraway place. Where the caravan camels roam... ",
7      "embed_url": "https://open.spotify.com/embed/track/0CKmN3Wwk8W4zjU0pqq2cv?utm_source=generator"
8    },
9    ...
10  ],
11

Data Encoding for the Recommendation Engine

We were looking for the best way to retrieve the songs. We evaluated different approaches. We used Activeloop DeepLake Vector Database - more specifically, its implementation in LangChain.

Creating the dataset is pretty straightforward. Given the previous json file, we proceed to embed the text field using langchain.embeddings.openai.OpenaAIEmbeddings and add all the rest of the keys/values as metadata

 
      
        1python
2from langchain.embeddings.openai import OpenAIEmbeddings
3from langchain.llms import OpenAI
4from langchain.vectorstores import DeepLake
5
6def create_db(dataset_path: str, json_filepath: str) -> DeepLake:
7    with open(json_filepath, "r") as f:
8        data = json.load(f)
9
10    texts = []
11    metadatas = []
12
13    for movie, lyrics in data.items():
14        for lyric in lyrics:
15            texts.append(lyric["text"])
16            metadatas.append(
17                {
18                    "movie": movie,
19                    "name": lyric["name"],
20                    "embed_url": lyric["embed_url"],
21                }
22            )
23
24    embeddings = OpenAIEmbeddings(model="text-embedding-ada-002")
25
26    db = DeepLake.from_texts(
27        texts, embeddings, metadatas=metadatas, dataset_path=dataset_path
28    )
29
30    return db
31

To load it, we can simply:

 
      
        1def load_db(dataset_path: str, *args, **kwargs) -> DeepLake:
2    db = DeepLake(dataset_path, *args, **kwargs)
3    return db
4

My dataset_path is hub://<ACTIVELOOP_ORGANIZATION_ID>/<DATASET_NAME>, but you can also store it locally. To store Deep Lake datasets locally, check out this doc here.

3 Approaches to Matching Moods to Songs

The next step was to find a way to match our songs with a given user input. In this tutorial, we tried 3 approaches so you don’t have to! Ultimately, we found a cheap way that worked qualitatively well. So let’s start with the failures 😅

What Didn’t Work

Similarity Search of Direct Embeddings

This approach was straightforward. We create embeddings for the lyrics and the user input with gpt3 and do a similarity search. Unfortunately, we noticed terrible suggestions because we want to match the user’s emotions to the songs rather than precisely what it says.

For example, if we search for similar songs using "I am Sad", we will see very similar scores across all documents:

 
      
        1 db.similarity_search_with_score("I am happy", distance_metric="cos", k=100)
2

If we plot the scores using a box plot, we will see they mostly are around 0.74.

langchain scores bad

While the first ten songs do not match so well

 
      
        1The World Es Mi Familia 0.7777353525161743
2Go the Distance 0.7724394202232361
3Waiting on a Miracle 0.7692896127700806
4Happy Working Song 0.7679054141044617
5In Summer 0.7620900273323059
6So Close 0.7601353526115417
7When I Am Older 0.7582702040672302
8How Far I'll Go 0.7560539245605469
9You're Welcome 0.7539903521537781
10What Else Can I Do? 0.7535801529884338
11

Using ChatGPT as a Retrieval System

We also tried to nuke the whole lyrics into ChatGPT and asked it to return matching songs with the user input. We had first to create a one-sentence summary of each lyric to fit into 4096 tokens. It resulted in around 3k tokens per request (0.006$). It follows the prompt template, which is very simple but very long. The {songs} variable holds the JSON with all the songs.

 
      
        1You act like a song retrieval system. We want to propose three songs based on the user input. We provide you a list of songs with their themes in the format <MOVIE_NAME>;<SONG_TITLE>:<SONG_THEMES>. To match the user input to the song, try to find themes/emotions from it and imagine what emotions the user may have and what song may be lovely to listen to. Add a bit of randomness to your decision.
2If you don't find a match, provide your best guess. Try to look at each song's themes to offer more variations in the match. Please only output songs contained in the following list.
3
4{songs}
5
6Given an input, output three songs as a list that goes well with the input. The list of songs will be used to retrieve them from our database. The type of reply is List[str, str, str]. Please follow the following example formats.
7
8Examples:
9Input: "Today I am not feeling great."
10["<MOVIE_NAME>;<SONG_TITLE>", "<MOVIE_NAME>;<SONG_TITLE>", "<MOVIE_NAME>;<SONG_TITLE>"]
11Input: "I am great today"
12["<MOVIE_NAME>;<SONG_TITLE>", "<MOVIE_NAME>;<SONG_TITLE>", "<MOVIE_NAME>;<SONG_TITLE>"]
13
14The user input is {user_input}
15

That did work okayish but was overkill.
Later on, we also tried emotional encoding we will discuss in the next section, which had comparable performance.

What Did Work: Similarity Search of Emotions Embeddings

Finally, we arrived at an inexpensive approach to run, which gives good results. We convert each lyric to a list of 8 emotions using ChatGPT. The prompt is the following

 
      
        1I am building a retrieval system. Given the following song lyric
2
3{song}
4
5You are tasked to produce a list of 8 emotions that I will later use to retrieve the song. 
6
7Please provide only a list of comma-separated emotions.
8

For example, using the “Arabian Nights” from Aladdin (shown in the previous section), we obtained "nostalgic, adventurous, exotic, intense, romantic, mysterious, whimsical, passionate".

We then embedded each emotion for each song with GPT3.5-turbo and stored it with Deep Lake.

The entire script is here

Then, we need to convert the user input to a list of emotions. We used ChatGPT again with a custom prompt.

 
      
        1text
2We have a simple song retrieval system. It accepts eight emotions. You are tasked to suggest between 1 and 4 emotions to match the users' feelings. Suggest more emotions for longer sentences and just one or two for small ones, trying to condense the central theme of the input.
3
4Examples:
5
6Input: "I had a great day!" 
7"Joy"
8Input: "I am exhausted today and not feeling well."
9"Exhaustion, Discomfort, and Fatigue"
10Input: "I am in Love"
11"Love"
12
13Please, suggest emotions for input = "{user_input}", and reply ONLY with a list of emotions/feelings/vibes.
14

Here we tasked the model to provide between one and four emotions. This worked best empirically, given the fact that most inputs are short.

Let’s see some examples:

 
      
        1"I'm happy and sad today" -> "Happiness, Sadness"
2"hey, rock you" -> "Energy, excitement, enthusiasm."
3"I need to cry" -> "Sadness, Grief, Sorrow, Despair." 
4

workflow langchain deeplake

Then we used these emotions to perform the similarity search on the vector database.

 
      
        1python
2user_input = "I am happy"
3# We use chatGPT to get emotions from a user's input
4emotions = chain.run(user_input=user_input)
5# We find the k more similar song
6matches = db.similarity_search_with_score(emotions, distance_metric="cos", k=k)
7

These are the scores obtained from that search (k=100). They are more spread apart.

scores improved

And the songs make more sense.

 
      
        1Down in New Orleans (Finale) 0.9068354368209839
2Happy Working Song 0.9066014885902405
3Love is an Open Door 0.8957026600837708
4Circle of Life 0.8907418251037598
5Where You Are 0.8890194892883301
6In Summer 0.8889626264572144
7Dig a Little Deeper 0.8887585401535034
8When We're Human 0.8860496282577515
9Hakuna Matata 0.8856213688850403
10The World Es Mi Familia 0.884093165397644
11

We also implement some postprocessing. We first filter out the low-scoring ones.

 
      
        1python
2def filter_scores(matches: Matches, th: float = 0.8) -> Matches:
3    return [(doc, score) for (doc, score) in matches if score > th]
4
5matches = filter_scores(matches, 0.8)
6

To add more variations, aka only sometimes recommend the first one, we need to sample from the list of candidate matches. To do so, we first ensure the scores sum to one by dividing by their sum.

 
      
        1python
2def normalize_scores_by_sum(matches: Matches) -> Matches:
3    scores = [score for _, score in matches]
4    tot = sum(scores)
5    return [(doc, (score / tot)) for doc, score in matches]
6

Then we sample n songs using a modified version of np.random.choice(..., p=scores), basically everything we sample we remove the element we have sampled. This ensures we don’t sample two times the same element.

 
      
        1python
2docs, scores = zip(*matches)
3docs = weighted_random_sample(
4    np.array(docs), np.array(scores), n=number_of_displayed_songs
5).tolist()
6for doc in docs:
7    print(doc.metadata["name"])
8

And finally, we have our songs. Then, we created a web app using Streamlit, and we hosted the app on an Hugging Face space. Go give it a try! :)

resulting app built with langchain and deeplake for music recommendation

Conclusion: Technology Choice Matters When Building a Recommendation Engine with LangChain and Deep Lake

While we explained how to mix these technologies to create a song recommendation system, you can apply the same principles to more use cases. With Deep Lake’s multi-modality, you can embed store multiple embeddings to the same set of lyrics, or even incorporate additional factors such as embeddings based on song tempo, instruments used, and more!

The main takeaway is understanding how to leverage LLMs to make the data work for you by transforming it to fit your task better. This was crucial for us since only after we converted both users’ inputs and songs’ lyrics to a list of emotions were we able to have suitable matches.

That’s all, folks 🎉

Thanks for reading, and see you in the next one 💜
Francesco